Vastly available digitized text data has created new opportunities for understanding social phenomena. Relatedly, social issues like toxicity, discrimination, and propaganda frequently manifest in text, making text analyses critical for understanding and mitigating them. In this course, we will centrally explore: how can we use NLP as a tool for understanding society? Students will learn core and recent advances in text-analysis methodology, building from word-level metrics to embeddings and language models as well as incorporating statistical methods such as time series analyses and causal inference.

Prerequisites: Pre-reqs: one of (EN.601.465/665, EN.601.467/667, EN.601.468/668) and familiarity with Python/PyTorch. Students may receive credit for EN.601.472 or EN.601.672, but not both.



Schedule

The current class schedule is below. The schedule is subject to change:

Date Topic Reference Readings Work Due
Mon Jan 22 Introduction, course expectations [slides]
Wed Jan 24 Word statistics [slides]
  1. Monroe et al. "Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict", Political Analysis (2008)
  2. Jurafsky and Martin, Speech and Language Processing, 3rd ed (2023) [Sec 6.6]
Mon Jan 29 Topic Modeling [slides]
  1. Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022.
  2. Roberts, Margaret E., et al. "The structural topic model and applied social science." Advances in neural information processing systems workshop on topic models: computation, application, and evaluation. Vol. 4. No. 1. 2013
Submit course goals form linked on Piazza
Wed Jan 31 Word Embeddings: Construction [slides]
  1. Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space."
  2. Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." NeuRIPS (2013).
HW 1 Released
Mon Feb 5 Word Embeddings: Applications and Evaluations [slides]
  1. Bolukbasi, Tolga, et al. "Man is to computer programmer as woman is to homemaker? Debiasing word embeddings." NeuRIPS (2016).
  2. Hamilton, William L., Jure Leskovec, and Dan Jurafsky. "Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change." ACL (2016).
  3. Garg, Nikhil, et al. "Word embeddings quantify 100 years of gender and ethnic stereotypes." PNAS (2018)
  4. Joseph, Kenneth, and Jonathan Morgan. "When do Word Embeddings Accurately Reflect Surveys on our Beliefs About People?." ACL (2020).
Wed Feb 7 Affect and Lexicons [slides]
  1. Giovanna Colombetti. "From affect programs to dynamical discrete emotions", Philosophical Psychology (2009).
  2.  Saif M. Mohammad. "Obtaining Reliable Human Ratings of Valence, Arousal, and Dominance for 20,000 English Words>" ACL (2018)
  3. Hamilton, William L., et al. "Inducing domain-specific sentiment lexicons from unlabeled corpora." EMNLP. (2016)
Mon Feb 12 Data annotating [slides]
  1. Zeerak Waseem. "Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter". In Proceedings of the First Workshop on NLP and Computational Social Science at ACL (2016)
  2. Luke Breitfeller, Emily Ahn, David Jurgens, and Yulia Tsvetkov. "Finding Microaggressions in the Wild: A Case for Locating Elusive Phenomena in Social Media Posts". In EMNLP (2019)
  3. ICML 2023 tutorial on RLHF
HW 1 due HW 2 released
Wed Feb 14 Class cancelled HW 1 due
Mon Feb 19 Classification Models [slides]
  1. Jurafsky & Martin Chap. 5
  2. Jurafsky & Martin Chap. 7
  3. Keith, Katherine, and Brendan O’Connor. "Uncertainty-aware generative models for inferring document class prevalence." EMNLP (2018)
Wed Feb 21 Hypothesis testing; Causal inference: Definitions [slides] HW 2 due
Mon Feb 26 Guest Lecture from Adam Koon [slides]
Wed Feb 28 Causal inference: Adjustments [slides]
  1. Brady Neal, “Introduction to Causal Inference from a Machine Learning Perspective”, Course Lecture Notes, Chapter 2
  2. Stuart EA. Matching methods for causal inference: A review and a look forward. Stat Sci. 2010 Feb 1;25(1):1-21. doi: 10.1214/09-STS313
  3. Chesnaye NC, Stel VS, Tripepi G, Dekker FW, Fu EL, Zoccali C, Jager KJ. An introduction to inverse probability of treatment weighting in observational research. Clin Kidney J. 2021 Aug 26;15(1):14-20. doi: 10.1093/ckj/sfab158
  4. Austin PC. An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies. Multivariate Behav Res. 2011 May;46(3):399-424. doi: 10.1080/00273171.2011.568786.
HW 3 released
Mon Mar 4 Causal inference: Text and NLP [slides]
  1. Keith, Katherine, David Jensen, and Brendan O’Connor. "Text and Causal Inference: A Review of Using Text to Remove Confounding from Causal Estimates." ACL. 2020.
  2. Roberts, Margaret E., Brandon M. Stewart, and Richard A. Nielsen. "Adjusting for confounding with text matching." American Journal of Political Science 64.4 (2020): 887-903.
  3. Veitch, Victor, Dhanya Sridhar, and David Blei. "Adapting text embeddings for causal inference." Conference on Uncertainty in Artificial Intelligence. PMLR, 2020.
  4. Field, Anjalie, and Yulia Tsvetkov. "Unsupervised Discovery of Implicit Gender Bias." Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.
Wed Mar 6 Network metrics [slides]
  1. Grover, A., & Leskovec, J. (2016, August). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 855-864).
  2. Xu, K., Hu, W., Leskovec, J., & Jegelka, S. (2018). How powerful are graph neural networks?. ICLR (2019).
  3. Yuan, H., Yu, H., Gui, S., & Ji, S. (2022). Explainability in graph neural networks: A taxonomic survey. IEEE transactions on pattern analysis and machine intelligence, 45(5), 5782-5799.
HW3 Due Friday 3/8
Mon Mar 11 Midterm Review [slides]
Wed Mar 13 Midterm HW3 Due Friday 3/15
Mon Mar 18 BREAK
Wed Mar 20 BREAK
Mon Mar 25 Ethics [slides]
  1. Bianchi, Federico, et al. "Easily accessible text-to-image generation amplifies demographic stereotypes at large scale." FAccT. 2023.
  2. Sap, Maarten, et al. "The Risk of Racial Bias in Hate Speech Detection." ACL. 2019
  3. Blodgett, Su Lin, et al. "Language (Technology) is Power: A Critical Survey of “Bias” in NLP" ACL. 2020
Project Proposal Released
Wed Mar 27 Language Modeling (Background) [slides]
  1. Jurafsky and Martin, Speech and Language Processing, 3rd ed (2023) [Sec 3]
  2. Jurafsky and Martin, Speech and Language Processing, 3rd ed (2023) [Sec 7.6]
Mon Apr 1 "Mapping the Modern Agora and Other Applications to Dilemmas in Democracy" with guest Hahrie Han
  1. de Vries, M., Kim, J.Y. & Han, H. The unequal landscape of civic opportunity in America. Nature Human Behavior 8, 256–263 (2024).
Wed Apr 3 Language Modeling: MLM Use Cases [slides]
  1. The Illustrated Transformer
  2. Card, Dallas, et al. "Computational analysis of 140 years of US political speeches reveals more positive but increasingly polarized framing of immigration." Proceedings of the National Academy of Sciences 119.31 (2022)
  3. Myra Cheng, et al. "AnthroScore: A Computational Linguistic Measure of Anthropomorphism" EACL (2024).
Mon Apr 8 History Applications (with Louis Hyman and Sam Backer)
Wed Apr 10 Language Modeling: Neural Topic Models [slides]
  1. Bianchi, F., Terragni, S., & Hovy, D. (2021). Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence. ACL.
  2. Grootendorst, Maarten. "BERTopic: Neural topic modeling with a class-based TF-IDF procedure." arXiv preprint arXiv:2203.05794 (2022).
  3. Pham, Chau Minh, et al. "TopicGPT: A prompt-based topic modeling framework." NAACL (2024).
HW 4 Released (Due April 19)
Mon Apr 15 Language Modeling: Prompting [slides]
  1. Can Large Language Models Transform Computational Social Science? Caleb Ziems, William Held, Omar Shaikh, Jiaao Chen, Zhehao Zhang, and Diyi Yang Computational linguistics, Mar 2024
  2. Brown et al. "Language Models are Few-Shot Learners", 2020
Wed Apr 17 Language Modeling: AI for Social Experiments with Ziang Xiao
  1. Ludwig and Mullainathan (2023) "Machine learning as a tool for hypothesis generation". The Quarterly Journal of Economics
  2. Christopher Bail (2023) "Can generative AI improve social science?"
  3. Xiao et. al. (2020) "Tell Me About Yourself: Using an AI-Powered Chatbot to Conduct Conversational Surveys with Open-ended Questions" ACM Transactions on Computer-Human Interaction
HW 4 due (on April 9)
Mon Apr 22 Sociology Applications [slides]
  1. Goldberg, A., & Stein, S. K. (2018). Beyond social contagion: Associative diffusion and the emergence of cultural variation. American Sociological Review, 83(5), 897-932.
  2. Kozlowski, A. C., Taddy, M., & Evans, J. A. (2019). The geometry of culture: Analyzing the meanings of class through word embeddings. American Sociological Review, 84(5), 905-949.
  3. Arseniev-Koehler, A., Cochran, S. D., Mays, V. M., Chang, K. W., & Foster, J. G. (2022). Integrating topic modeling and word embedding to characterize violent deaths. Proceedings of the National Academy of Sciences, 119(10), e2108801119.
  4. Rodriguez, P. L., Spirling, A., & Stewart, B. M. (2023). Embedding regression: Models for context-specific description and inference. American Political Science Review, 117(4), 1255-1274.
Wed Apr 24 Language Modeling: Social Simulations [slides]
  1. Park, Joon Sung, et al. "Social simulacra: Creating populated prototypes for social computing systems." Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 2022.
  2. Park, Joon Sung, et al. "Generative agents: Interactive simulacra of human behavior." Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology. 2023.
  3. Gati Aher, Rosa I. Arriaga, Adam Tauman Kalai (2023) “Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies”, ICLR

Policies

Late Days Each student can use 5 late days for HW assignments over the course of the semester. Late days can be distributed in any way accross assignments. Additional extensions will not be granted, and work turned in late after all late days have been used will receive 0 credit. If a group assignment is turned in late, it will count as a late day for all students in the group. Late days cannot be used for the final project (e.g. HW 5) report.

Course Conduct This course includes topics that could raise differing opinions. All students are expected to respect everyone's perspective and input and to contribute towards creating a welcoming and inclusive climate. We the instructors will strive to make this classroom an inclusive space for all students, and we welcome feedback on ways to improve.

Academic Integrity This course will have a zero-tolerance philosophy regarding plagiarism or other forms of cheating, and incidents of academic dishonesty will be reported. A student who has doubts about how the Honor Code applies to this course should obtain specific guidance from the course instructor before submitting the respective assignment.

Discrimination and Harrasment The Johns Hopkins University is committed to equal opportunity for its faculty, staff, and students. To that end, the university does not discriminate on the basis of sex, gender, marital status, pregnancy, race, color, ethnicity, national origin, age, disability, religion, sexual orientation, gender identity or expression, veteran status, military status, immigration status or other legally protected characteristic. The University's Discrimination and Harassment Policy and Procedures provides information on how to report or file a complaint of discrimination or harassment based on any of the protected statuses listed in the earlier sentence, and the University’s prompt and equitable response to such complaints.

Personal Well-being Take care of yourself! Being a student can be challenging and your physical and mental health is important. If you need support, please seek it out. Here are several of the many helpful resources on campus:

Acknowledgements Thank you Daniel Khashabi for sharing the course website template!